Bayes Rule

We're going to talk about perhaps the holy grail of probabilistic inference. It's called Bayes Rule. Bayes Rule is based on Reverend Thomas Bayes, who used this principle to infer the existence of God, but in doing so, he created a new family of methods that has vastly influenced artificial intelligence and statistics.

Quiz: Cancer Test

  • Let's use the cancer example from my last unit. There's a specific cancer that occurs in 1% of the population, and a test for this cancer and with 90% chance it is positive if they have this cancer, C. That's usually called the sensitivity.
  • But the test sometimes is positive, even if you don't have C. Let's say with another 90% chance it's negative if we don't have C. That's usually called the specificity.
  • So here's my question. Without further symptoms, you take the test, and the test comes back positive. What do you think is the probability of having that specific type of cancer?
  • To answer this, let's draw a diagram. Suppose these are all of the people, and some of them, exactly 1%, have cancer. 99% is cancer free. We know there's a test that if you have cancer, correctly diagnose it with 90% chance. So if we draw the area where the test is positive, cancer and test positive, then this area over here is 90% of the cancer circle. However, this isn't the full truth.
  • The test sent out as positive even if the person doesn't have cancer. In fact, in our case, it happened to be in 10% of all cases. So we have to add more area, because as big as 10% of this large area is as big as 10% of this large area where the test might go positive, but the person doesn't have cancer. So this blue area is 10% of all the area over here minus the little small cancer circle. And clearly, all the area outside these circles corresponds a situation of no cancer, and the test is negative.
  • So let me ask you again. Suppose we have a positive test, what do you think? Would a prior probability of cancer of 1%, a sensitivity and specificity of 90%, Do you think your new chances are now 90% or 8% or still just 1%?
  • The question being asked is this: 1% of the population has cancer. Given that there is a 90% chance that you will test positive if you have cancer and that there is a 90% chance you will test negative if you don't have cancer, what is the probability that you have cancer if you test positive?

Screenshot taken from Udacity

Answer

  • And I would argue it's about 8%. In fact, as we see, it will come out at 8 1/3% mathematically. And the way to see this in this diagram is this is the region that should test as positive. By having a positive test, you know you're in this region, and nothing else matters.
  • You know you're in this circle. But within this circle, the ratio of the cancerous region relative to the entire region is still pretty small. It increase, obviously, having a positive test changes your cancer probability, but it only increases by a factor of about 8, as we will see in a second.

Screenshot taken from Udacity

Quiz: Prior and Posterior

  • So this is the essence of Bayes Rule, which I'll give to you to you in a second. There's some sort of a prior, of which we mean the probability before you run a test, and then you get some evidence from the test itself, and that all leads you to what's called a posterior probability.
  • Now this is not really a plus operation. In fact, in reality, it's more like a multiplication, but semantically, what Bayes Rule does is it incorporates some evidence from the test into your prior probability to arrive at a posterior probability.
  • So let's make this specific. In our cancer example, we know that the prior probability of cancer is 0.01, which is the same as 1%. The posterior of the probability of cancer given that our test is positive, abbreviate here as positive, is the product of the prior times our test sensitivity, which is what is the chance of a positive result given that I have cancer?
  • And you might remember, this was 0.9, or 90%. Now just to warn you, this isn't quite correct. To make this correct, we also have to compute the posterior for the non cancer option, which there is no cancer given a positive test. And using the prior, we know that P of not C is 0.99. It's minus P of C Times the probability of getting a positive test result given not C.
  • Realize these 2 equations are the same, but I exchanged C for not C. And this one over here takes a moment to computer. We know that our test gives us a negative result if it's cancer free, 0.9 chance As a result, it gives us a positive result in the cancer free case, with 10% chance.
  • Now what's interesting is this is about the correct equation except the probabilities don't add up to 1. To see I'm going to ask you to compute those, so please give me the exact numbers for the first expression and the second expression written over here using our example up there.

Screenshot taken from Udacity

Answer

  • Obviously, P(C) is 0.01 x 0.9 is 0.009, whereas 0.99 x 0.1, this guy over here, is 0.099.
  • What we've computed is here is the absolute area in here which is 0.009 in the absolute area in here which is 0.099.

Screenshot taken from Udacity

Quiz: Normalizing

The normalization proceeds in two steps. We just normalized these guys to keep ratio the same but make sure they add up to 1. So let's first compute the sum of these two guys.

Answer

  • And, yes, the answer is 0.108. Technically, what this really means is the probability of a positive test result-- that's the area in the circle that I just marked. By virtue of what we learned last, it's just the sum of two things over here, which gives us 0.108.

Screenshot taken from Udacity

And now finally, we come up with the actual posterior, whereas this one over here is often called the joint probability of two events. And the posterior is obtained by dividing this guy over here with this normalizer. So let's do this over here--let's divide this guy over here by this normalizer to get my percent distribution of having cancer given that I received the positive test result.

Answer

  • The answer is 0.0833.

Screenshot taken from Udacity

Let's do the same for the non-cancer version, pick the number over here to divide and divide it by this same normalizer.

Answer

  • The answer is 0.9167 approximately.

Screenshot taken from Udacity

Quiz: Total Probability

Why don't you for a second add these two numbers and give me the result?

  • The answer is 1

Screenshot taken from Udacity

Bayes Rule Diagram

  • Well, we really said that we had a situation where the prior P(C), a test with a certain sensitivity (Pos/C), and a certain specificity (Neg/₇C). When you receive, say, a positive test result, what you do is, you take your prior P(C) you multiply in the probability of this test result, given C, and you multiply in the probability of the test result given (Neg/₇C).
  • So, this is your branch for the consideration that you have cancer. This is your branch for the consideration of no cancer. When you're done with this, you arrive at a number that now combines the cancer hypothesis with the test result. Look for the cancer hypothesis and the no cancer hypothesis. Now, what you do, you add those up and then normally don't add up to one. You get a certain quantity which happens to be the total probability that the test is what it was in this case positive.
  • And all you do next is divide or normalize this thing over here by the sum over here and the same on the right side. The divider is the same for both cases because this is your cancer branch, your non-cancer branch, but this score does not depend on the cancer variable anymore. What you now get out is the desired posterior probability, and those add up to 1 if you did everything correct, as shown over here. This is the algorithm for Bayes Rule.

Screenshot taken from Udacity

Equivalent Diagram

Now, the same algorithm works if your test says negative. Suppose your test result says negative. You could still ask the same question:

  • Now, what's my probability having cancer or not? But now all the positives in here become negatives.
  • The sum is the total probability of negative test results, and we may now divide by this score, you now get the posterior probability for cancer and non-cancer assuming you had a negative test result, which of course to be much, much more favorable for you because none of us wants to have cancer.
  • So, look at this for a while and let's now do the calculation for the negative case using the same numbers I gave you before, and with the step by step this time around so it can really guide you through the process.

Screenshot taken from Udacity

Quiz: Cancer Probabilities

We begin with our prior probability, our sensitivity and our specifitivity, and I want you to begin by filling in all the missing values. So, there's the probability of no cancer, probability of negative, which is negation of positive, given C, and probability of negative-positive given not C.

Answer

  • And obviously this is still 0.99 as before 0.1 and 0.1. I hope you got this correct.

Screenshot taken from Udacity

Quiz: Probability Given Test

Now assume the test comes back negative, the same logic applies as before. So please give me the combined probability of cancer given the negative test result and the combined probability of being cancer-free given the negative test result.

Answer

  • The number here is 0.001 and it's the product of my prior for cancer which is 0.01, and the probability of getting a negative result in the case of cancer which is right over here, 0.1. If I multiply these two things together, I get 0.001.
  • The probability here is 0.891. And when I'm multiplying is the prior probability of not having cancer which is 0.99 with the probability of seeing a negative result in the case of not having cancer, and that is the one right over here, 0.9. So, we'll multiply 0.99 with 0.9, I actually get 0.891.

Screenshot taken from Udacity

Quiz: Normalizer

Let's compute the normalizer. You now remember what this was.

Answer

  • And the answer is 0.892. You just add up these two values over here.

Screenshot taken from Udacity

Quiz: Normalizaing Probability

Now finally tell me what is posterior probability of cancer given that we know we had a negative test result and the probability of negative cancer given there is a negative test result.

Answer

  • This is approximately 0.0011, which we get by dividing 0.001 by the normalizer 0.892, and the posterior probability of being cancer-free after the test is approximately 0.9989, and that's obtained by dividing this probability over here by the normalizer and not surprisingly, these two values indeed add up to 1.
  • Now, what's remarkable about this outcome is really what it means. Before the test, we had a 1% chance of having cancer, now, we have about a 0.9% chance of having cancer. So, a cancer probability went down by about a factor of 9.
  • So, the test really helped us gaining confidence that we are cancer-free. Conversely, before we had a 99% chance of being cancer free, now it's 99.89%. So, all the numbers are working exactly how we expect them to work.

Screenshot taken from Udacity

Quiz: Disease Test 1

Let me now make your life harder. Suppose our probability of a certain other kind of disease is 0.1, so 10% of the population has it. Our test in the positive case is really informative, but there's a 0.5 chance that if I'm cancer-free the test, indeed, says the same thing. So the sensitivity is high, the specificity is lower. And let's start by filling in the first 3 of them.

Answer

  • Obviously, these are just 1 minus those: 0.9, 0.1, and 0.5.

Screenshot taken from Udacity

Quiz: Disease Test 2

What is P(C, Neg)?

Answer

  • And the answer is 0.01.
  • P(C) = 0.1, and P(Neg│C) is also 0.1, so if you multiply those two they are 0.01.

Screenshot taken from Udacity

Quiz: Disease Test 3

And what's the same for P(¬C, Neg).

Answer

  • And the answer is 0.45.
  • P(¬C) is 0.9, and P(Neg│¬C) is 0.5. So 0.9 * 0.5 = 0.45.

Screenshot taken from Udacity

Quiz: Disease Test 4

What is P(Neg)?

Answer

  • Well, you just add up these two numbers to get 0.46.

Screenshot taken from Udacity

Quiz: Disease Test 5

So tell me what the final two numbers are.

Answer

  • The first one is 0.01 divided by normalized 0.46 and that gives us 0.0217, and the second one is called over here 0.45 divided by 0.46 and that gives us 0.9783
  • These are the correct posteriors, restarted our chance of 10% of having cancer. We had a negative result. We're down now to about 2% of having cancer.

Screenshot taken from Udacity

Quiz: Disease Test 6

Let's now consider the case that the test result is positive, and I want you to just give me the two numbers over here and not the other ones.

Answer

  • So once again, we have 0.9, 0.1, and 0.5 over here.
  • Very quickly multiplying this guy with this girl over here 0.09. This guy with this girl over here 0.45.Adding them up gives us 0.54, and dividing those correspondingly 0.9 divided by 0.54gives us 0.166 and so on and 0.833 and so on for dividing 0.45 by 0.54.
  • And with this means, with the positive test result, our chance of cancer increased from 0.1 to 0.16. Obviously, our chance of having no cancer decreased accordingly.

Screenshot taken from Udacity

Bayes Rule Summary

  • In Bayes rule, we have a hidden variable we care about--whether they have cancer or not. But we can't measure it directly and instead we have a test. We have a prior of how frequent this variable is true and the test is generally characterized by how often it says positive when the variable is true and how often it is negative and the variable is false.
  • Bayes rule takes a prior, multiplies in the measurement, which in this case we assume to be the positive measurement to give us a new variable and does the same for all actual measurement, given the opposite assumption about our hidden variable of cancer and that multiplication gives us this guy over here.
  • We add those two things up and then it gives us a new variable and then we divide these guys to arrive the best estimate of the hidden variable c given our test result. And this example, I used the positive example is a test result but it might do the same with a negative example.
  • This was exactly the same as in our diagram in the beginning. There was a prior of our case, we have this specific variable to be true. We noticed inside this prior, it can cover the region for which our test result applies.We noticed that test result also apply when the condition is not fulfilled.
  • So, this expression over here and this expression over here corresponds exactly to the red area over here and the green area over here. But then we noticed that these two areas don't add up to 1. The reason is that's lots of stuff outside, so we calculated the total area which was this expression over here, pPos.
  • And then we normalized these two things over here by the total area to get the relative area that is assigned the red thing versus the green thing and at this time by just dividing by the total area in this region over here; thereby, getting rid of any of the other cases.

Screenshot taken from Udacity

Quiz: Robot Sensing 1

Now, I should say if we got this, you don't find any immediate significant about statistics and probability. This is totally nontrivial, but it comes in very handy.

  • So, I'm going to practice this with you using a second example. In this case, you are a robot.
  • This robot lives in a world of exactly two places. There is a red place and a green place, R and G. Now, I say initially, this robot has no clue where it is, so the prior probability for either place, red or green, is 0.5.
  • It also has a sensor as it can see through its eyes, but his sensor seems to be somewhat unreliable. So, the probability of seeing red at the red grid cell is 0.8, and the probability of seeing green at the green cell is also 0.8.
  • Now, I suppose the robot sees red. What are now the posterior probabilities that the robot is at the red cell given that it just saw red and conversely what's the probability that it's at the green cell even though it saw red. Now, you can apply Bayes Rule and figure that out.

Answer

In this example, it gives us funny numbers.

  • It was for red as 0.8 and one for the green as 0.2. 8And it's all to do with the fact that in the beginning where there had no clue where it is.
  • The joint for red after seeing red is 0.4. The same for green is 0.1. 0.4+0.1, S to 0.5. If you normalized 0.4 divided by 0.5, you get 0.8, and if you normalized 0.1 by 0.5, you get 0.2.

Break down by steps

- P(at R|see R) = P(at R) x P(see R|at R) = 0.5 * 0.8 = 0.4
- P(at G|see R) = P(at G) x P(see R|at G) = 0.5 * 0.2 = 0.1
- P(at R) = P(at R|see R) + P(at G|see R) = 0.5
- P(at R|see R) = P(at R|see R)/P(at R) = 0.4/0.5 = 0.8
- P(at G|see R) = P(at G|see R)/P(at R) = 0.1/0.5 = 0.2

Screenshot taken from Udacity

Quiz: Robot Sensing 2

If I now change some parameters--say the robot knows the probability that it's red, and therefore, the probability 1 is under the green cell as a prior.mPlease calculate once again using Bayes rule these posteriors. I have to warn you--this is a bit of a tricky case.

Answer

  • And the answer is, the prior isn't affected by the measurement, so the probability of 0 is at red, and the probability of 1 at green, despite the fact that it's all red.
  • To see this, you find the joint of seeing it red and seeing red is 0 times 0.8, that's 0.
  • That's the same join for green is 1 times 0.2.
  • So you have to normalize 0 and 0.2. The sum of those is 0.2.
  • So let's divide 0 by 0.2, gives us 0, and 0.2 divided by 0.2 gives us 1.
  • These are exactly the numbers over here.

Screenshot taken from Udacity

Quiz: Robot Sensing 3

To change this example even further. Let's make this over here a 0.5 and revert back to a uniform prior. Please go ahead and calculate the posterior probability.

Answer

  • Now the answer is about 0.615 or 0.385.
  • These are approximate. Once again, 0.5 times 0.8 is 0.4. 0.5 minus this guy is again 0.5. 0.25, add those up, 0.65,
  • normalizing 0.4 divided by 0.65 gives approximately 0.615. 0.25 divided by 0.65 is approximately 0.385.

Break down by steps

- P(at R|see R) = P(at R) x P(see R|at R) = 0.5 * 0.8 = 0.4
- P(at G|see R) = P(at G) x P(see R|at G) = 0.5 * 0.5 = 0.25
- P(at R) = P(at R|see R) + P(at G|see R) = 0.65
- P(at R|see R) = P(at R|see R)/P(at R) = 0.4/0.65 = 0.615
- P(at G|see R) = P(at G|see R)/P(at R) = 0.05/0.65 = 0.385

Screenshot taken from Udacity

Quiz: Robot Sensing 4

  • I will now make your life really hard. Suppose there are 3 places in the world, not just 2. There are a red one and 2 green ones. And for simplicity, we'll call them A, B, and C.
  • Let's assume that all of them have the same prior probability of 1/3 or 0.333, so on. Let's say the robot sees red, and as before, the probability of seeing red in Cell A is 0.9. The probability of seeing green in Cell B 20.9. Probability of seeing green in Cell C is also 0.9.
  • So what I've changed is, I've given the hidden variable, kind of like the cancer/non cancer variable, 3 states. There's not just 2 as before, A or B. It's now A, B, or C.
  • Let's solve this problem together, because it follows exactly the same recipe as before, even though it might not be obvious.
  • So let me ask you, what is the joint of being in Cell A after having seen the red color?

Answer

  • And just like before, we multiply the prior, this guy over here, that gives you 0.3.
  • P(A,R) = P(A) * P(R|A) = 0.333 x 0.9 = 0.3

Screenshot taken from Udacity

Quiz: Robot Sensing 5

What's the joined for Cell B?

Answer

  • Well, the answer is you multiply our prior of 1/3 with the probability of seeing red in Cell B, as seeing green at 0.9 probability, so red is 0.1. So 0.1 times this guy over here gives 0.033.
  • P(B,R) = P(B) * P(R|B) = 0.333 x 0.1 = 0.033

Screenshot taken from Udacity

Quiz: Robot Sensing 6

Finally, probability of C and Red. What is that?

Answer

  • And the answer is exactly the same as this over here, because the prior is the same for B and C, and those probabilities are the same for B and C, so they should be exactly the same.
  • P(C,R) = P(C) * P(R|C) = 0.333 x 0.1 = 0.033

Screenshot taken from Udacity

Quiz: Robot Sensing 7

What is our normalizer?

Answer

  • And the answer is, you just add those up.
  • P(R) = P(A,R) + P(B,R) + P(C,R) = 0.3 + 0.33 + 0.33 = 0.366

Screenshot taken from Udacity

Quiz: Robot Sensing 8

And now we calculate the desired posterior probability for all 3 possible outcomes.

Answer

  • As usual, we divide this guy over here by the normalizer, which gives us 0.818. Realize all these numbers are a little bit approximate here. Same for this guy, it's approximately 0.091. And this is completely symmetrical, 0.091. And surprise, these guys all add up to 1.

Screenshot taken from Udacity

Generalizing

So what have you learned?

  • In Bayes Rule, there will be more than just 2 underlying causes of cancer/non cancer. There might be 3, 4, or 5, any number. We can apply exactly the same math, but we have to keep track of more values.
  • In fact, the robot might also have more than just 2 test outcomes. Here was red or green, but it could be red, green, or blue.
  • And this means that our measurement probability will be more elaborate. I have to give you more information, but the math remains exactly the same. We can now deal with very large problems that have many possible hidden causes of where the world might be, and we can still apply Bayes Rule to find all of these numbers.

Quiz: Sebastian At Home

  • This test is actually directly taken from my life and you'll smile when you see my problem. I used to travel a lot. It was so bad for a while. I would find myself in a bed not knowing what country I'm in. I kid you not.
  • So let's say, I'm gone 60% of my time and I'm at home only 40% of my time. Now at summer, I live in California and it truly doesn't rain in the summer. Whereas in many of the countries I have traveled to, there's a much higher chance of rain.
  • So let's now say, I lie in my bed, here I am lying in bed, and I wake up and I open the window and I see it's raining. Let's now apply Bayes rule--What do you think is the probability I'm home now that I see it's raining--just give me this one number.

Answer

  • And I get 0.0217, which is a really small thing.
  • And the way I get there is what taking home times the probability of rain at home normalizing it using the same number of a year plus the calculation for the same probability of being gone is 0.6 times the rain I've been gone has a probability of 0.3 and that results is 0.0217 or the better of 2%--did you get this?
  • If so, you now understand something that's really interesting. You're able to look at a hidden variable, understand how a test can give you information back about this hidden variable and that's really cool because it allows you to apply the same scheme to great many practical problems in the world--congratulations! In our next unit, which is optional, I like you to program all of this so you can try the same thing in an actual program interface and writes software that implements things such as Bayes rule.

Break down by steps

- P(home|rain) = P(home) x P(rain|home) = 0.4 * 0.01 = 0.004
- P(gone|rain) = P(gone) x P(rain|gone) = 0.6 * 0.3 = 0.18
- P(rain) = P(home|rain) + P(gone|rain) = 0.184
- P(home|rain) = P(home|rain)/P(rain) = 0.004/0.184 = 0.0217
- P(gone|rain) = P(gone|rain)/P(rain) = 0.18/0.184 = 0.978

Screenshot taken from Udacity